document retrieval
- Asia > China (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Singapore (0.04)
- Information Technology > Information Management > Search (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.76)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)
A Neural Corpus Indexer for Document Retrieval
Current state-of-the-art document retrieval solutions mainly follow an index-retrieve paradigm, where the index is hard to be directly optimized for the final retrieval target. In this paper, we aim to show that an end-to-end deep neural network unifying training and indexing stages can significantly improve the recall performance of traditional methods. To this end, we propose Neural Corpus Indexer (NCI), a sequence-to-sequence network that generates relevant document identifiers directly for a designated query. To optimize the recall performance of NCI, we invent a prefix-aware weight-adaptive decoder architecture, and leverage tailored techniques including query generation, semantic document identifiers, and consistency-based regularization. Empirical studies demonstrated the superiority of NCI on two commonly used academic benchmarks, achieving +21.4% and +16.8% relative enhancement for Recall@1 on NQ320k dataset and R-Precision on TriviaQA dataset, respectively, compared to the best baseline method.
M3DR: Towards Universal Multilingual Multimodal Document Retrieval
Kolavi, Adithya S, Jain, Vyoman
Multimodal document retrieval systems have shown strong progress in aligning visual and textual content for semantic search. However, most existing approaches remain heavily English-centric, limiting their effectiveness in multilingual contexts. In this work, we present M3DR (Multilingual Multimodal Document Retrieval), a framework designed to bridge this gap across languages, enabling applicability across diverse linguistic and cultural contexts. M3DR leverages synthetic multilingual document data and generalizes across different vision-language architectures and model sizes, enabling robust cross-lingual and cross-modal alignment. Using contrastive training, our models learn unified representations for text and document images that transfer effectively across languages. We validate this capability on 22 typologically diverse languages, demonstrating consistent performance and adaptability across linguistic and script variations. We further introduce a comprehensive benchmark that captures real-world multilingual scenarios, evaluating models under monolingual, multilingual, and mixed-language settings. M3DR generalizes across both single dense vector and ColBERT-style token-level multi-vector retrieval paradigms. Our models, NetraEmbed and ColNetraEmbed achieve state-of-the-art performance with ~150% relative improvements on cross-lingual retrieval.
- Asia > India > Karnataka > Bengaluru (0.40)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
QBR: A Question-Bank-Based Approach to Fine-Grained Legal Knowledge Retrieval for the General Public
Yuan, Mingruo, Kao, Ben, Wu, Tien-Hsuan
Retrieval of legal knowledge by the general public is a challenging problem due to the technicality of the professional knowledge and the lack of fundamental understanding by laypersons on the subject. Traditional information retrieval techniques assume that users are capable of formulating succinct and precise queries for effective document retrieval. In practice, however, the wide gap between the highly technical contents and untrained users makes legal knowledge retrieval very difficult. We propose a methodology, called QBR, which employs a Questions Bank (QB) as an effective medium for bridging the knowledge gap. We show how the QB is used to derive training samples to enhance the embedding of knowledge units within documents, which leads to effective fine-grained knowledge retrieval. We discuss and evaluate through experiments various advantages of QBR over traditional methods. These include more accurate, efficient, and explainable document retrieval, better comprehension of retrieval results, and highly effective fine-grained knowledge retrieval. We also present some case studies and show that QBR achieves social impact by assisting citizens to resolve everyday legal concerns.
- North America > United States (0.28)
- North America > Canada (0.04)
- Asia > China > Hong Kong (0.04)
- (4 more...)
- Law (1.00)
- Health & Medicine > Therapeutic Area (0.46)
- Government > Regional Government (0.46)
Attention Grounded Enhancement for Visual Document Retrieval
Cui, Wanqing, Huang, Wei, Guo, Yazhi, Hu, Yibo, Jin, Meiguang, Ma, Junfeng, Bi, Keping
Visual document retrieval requires understanding heterogeneous and multi-modal content to satisfy information needs. Recent advances use screenshot-based document encoding with fine-grained late interaction, significantly improving retrieval performance. However, retrievers are still trained with coarse global relevance labels, without revealing which regions support the match. As a result, retrievers tend to rely on surface-level cues and struggle to capture implicit semantic connections, hindering their ability to handle non-extractive queries. To alleviate this problem, we propose a \textbf{A}ttention-\textbf{G}rounded \textbf{RE}triever \textbf{E}nhancement (AGREE) framework. AGREE leverages cross-modal attention from multimodal large language models as proxy local supervision to guide the identification of relevant document regions. During training, AGREE combines local signals with the global signals to jointly optimize the retriever, enabling it to learn not only whether documents match, but also which content drives relevance. Experiments on the challenging ViDoRe V2 benchmark show that AGREE significantly outperforms the global-supervision-only baseline. Quantitative and qualitative analyses further demonstrate that AGREE promotes deeper alignment between query terms and document regions, moving beyond surface-level matching toward more accurate and interpretable retrieval. Our code is available at: https://anonymous.4open.science/r/AGREE-2025.
- Asia > China > Beijing > Beijing (0.05)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval
Masry, Ahmed, Thakkar, Megh, Bechard, Patrice, Madhusudhan, Sathwik Tejaswi, Awal, Rabiul, Mishra, Shambhavi, Suresh, Akshay Kalkunte, Daruru, Srivatsava, Hoque, Enamul, Gella, Spandana, Scholak, Torsten, Rajeswar, Sai
Retrieval-augmented generation has proven practical when models require specialized knowledge or access to the latest data. However, existing methods for multimodal document retrieval often replicate techniques developed for text-only retrieval, whether in how they encode documents, define training objectives, or compute similarity scores. To address these limitations, we present ColMate, a document retrieval model that bridges the gap between multimodal representation learning and document retrieval. ColMate utilizes a novel OCR-based pretraining objective, a self-supervised masked contrastive learning objective, and a late interaction scoring mechanism more relevant to multimodal document structures and visual characteristics. ColMate obtains 3.61% improvements over existing retrieval models on the ViDoRe V2 benchmark, demonstrating stronger generalization to out-of-domain benchmarks.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Information Management > Search (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.76)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)
- Asia > China (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Singapore (0.04)
SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based Instruction Dataset Creation
Barati, Iman, Amiri, Mostafa, Faili, Heshaam
Supervised Fine-Tuning (SFT) is essential for training large language models (LLMs), significantly enhancing critical capabilities such as instruction following and in-context learning. Nevertheless, creating suitable training datasets tailored for specific domains remains challenging due to unique domain constraints and data scarcity. In this paper, we propose SearchInstruct, an innovative method explicitly designed to construct high quality instruction datasets for SFT. Our approach begins with a limited set of domain specific, human generated questions, which are systematically expanded using a large language model. Subsequently, domain relevant resources are dynamically retrieved to generate accurate and contextually appropriate answers for each augmented question. Experimental evaluation demonstrates that SearchInstruct enhances both the diversity and quality of SFT datasets, leading to measurable improvements in LLM performance within specialized domains. Additionally, we show that beyond dataset generation, the proposed method can also effectively facilitate tasks such as model editing, enabling efficient updates to existing models. To facilitate reproducibility and community adoption, we provide full implementation details, the complete set of generated instruction response pairs, and the source code in a publicly accessible Git repository: [https://github.com/mostafaamiri/SearchInstruct](https://github.com/mostafaamiri/SearchInstruct)
- Europe > Austria > Vienna (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > Ukraine (0.04)
- (5 more...)